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ABSTRACT 



Aims. This paper tackles important aspects of comets dynamics from a statistical point of view. Existing methodology uses numerical 
integration for computing planetary perturbations for simulating such dynamics. This operation is highly computational. It is reason- 
able to wonder whenever statistical simulation of the perturbations can be much more easy to handle. 

Methods. The first step for answering such a question is to provide a statistical study of these perturbations in order to catch their 
main features. The statistical tools used are order statistics and heavy tail distributions. 

Results. The study carried out indicated a general pattern exhibited by the perturbations around the orbits of the important planet. 
These characteristics were validated through statistical testing and a theoretical study based on Opik theory. 

Key words. Methods: statistical; Celestial mechanics; Oort cloud 
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Comet dynamics is one of the most difficult phenomena to 
model in celestial mechanics. Indeed their dynamics is strongly 
chaotic, thus individual motions of known comets are hardly re- 
producible for more than a few orbital periods. When the origin 
of comets is under investigation, one is thus constrained to make 
use of statistical tools in order to model the motion of a huge 
number of comets supposed to be representative of the actual 
population. Such statistical model should also be reliable on a 
time scale comparable to the age of the solar system. 

Due to their very elongated shapes, comet trajectories are af- 
fected by planetary perturbations during close encounters with 
planets. Such perturbations turn out to be the main mechanisms 
able to affect comet trajectories. Consequently, it is of major im- 
portance to model these perturbations in a way which is statisti- 
cally reliable and with the lowest cost in computing time. 

A direct numerical integration of a 6 bodies restricted prob- 
lem (Sun, Jupiter, Saturn, Uranus, Neptune, Comet) each time 
a comet enters the planetary region of the Solar System is not 
possible due to the cost in computer time. 

Looking for an alternative approach, we can take advantage 
of the fact that planetary perturbation on Oort cloud comets 
are uncorrected. In fact the orbital period of such comets are 
so much larger than those of the planets, that when the comet 
returns, the phases of the latter can be taken at random. Thus 
we can build a synthetic inte grator a la Froeschle and Rickman 
(Froeschle & Rickman 1981) to speedup the modeling. The crit- 
icism by dFouchard et alj|2003l) to such an approach does not ap- 
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ply in the present case because, as just said, successive planetary 
perturbations on an Oort cloud comets are uncorrelated. 

The aim of this paper is to give a statistical description of 
a large set of planetary perturbations assumed to be representa- 
tive of those acting on Oort cloud comets entering the planetary 
region. To this purpose we use order statistics and heavy tails 
distributions. 

The rest of this paper is organised as follow. Section 2 is 
devoted to the presentation of the mechanism producing the data, 

1. e. the planetary perturbations and the statistical tools used to 
analyse the data. These tools are order statistics and heavy-tail 
distributions, that allow, respectively, the study and the modeling 
of the data distribution, with respect to its symmetry, skewness 
and tail fatness. The obtained results are shown and interpreted 
in the third section. The results are finally analysed from a more 
theoretical point of view using the Opik theory in Section 4. The 
paper closes with conclusions and perspectives. 

2. Statistical tools 

2.1. Data compilation 

By planetary perturbations, one intends the variations of the or- 
bital parameters between their values before entering the plan- 
etary region of the Solar System, i.e. the barycentric orbital 
element of the osculating cometary orbit (z;, qu cos z',-, o>i, Q,) T 
(where q, i, w, Q are the perihelion distance, the inclination, the 
argument of perihelion and the longitude of the ascending node 
and z = -I I a with a the semi-major axis), and their final val- 
ues (zf,qf,cosif,a>f,Q.f) T , that is either when the comet is at 
its aphelion or when it is back on a keplerian barycentric orbit. 

Between its initial and final values, the system Sun + Jupiter 
+ Saturn + Uranus + Neptune + com et is integrate d using the 
RADAU integrator at the 15th order dEverha rt 1985) for a max- 
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imum of 2 000 yrs. Then the planetary perturbation obtained 
through this integration is (Az = Zf — Zt,Aq = qj - qi, A cos i 
cos if - cos /,, Aw = at/ - u>i, AQ. - £2/ - Q,) r . The detail on the 
numeric al experiment used to perform the integrations may be 
found in lRickman et all d200ll) . 

Repeating the above experiment with a huge number of 
comets (namely 9 600 000), one gets a set of planetary pertur- 
bations. The comets are chosen with uniform distribution of the 
perihelion distance between and 32 AU, cosine of the ecliptic 
inclination between -1 and 1 and argument of perihelion, longi- 
tude of the ascending node between and 360°. The initial mean 
anomaly is chosen such that the perihelion passage on its initial 
keplerian orbit occurs randomly with an uniform distribution be- 
tween 500 and 1 500 years after the beginning of the integration. 

In the present study, b ecause the pertu rbations are mainly 
depending on q, and cos z'; dFernandezll 19811) . each perturbation 
is associated to the couple (cos qi). Similarly, since the orbital 
energy is the main quantity which is affected by the planetary 
perturbations, we will consider only these perturbations here. 

Consequently, our data are composed by a set of triplets 
(cos iu qi,Z) where Z = zj - Zi denotes the perturbations of the 
cometary orbital energy by the planets, and (cos /,, qf) a point in a 
space denoted by K. In the following, we call Z the perturbation 
mark. 

2.2. Exploratory analysis based on order statistics 

Let Zi , . . . , Z„ be a sequence of independent identically dis- 
tributed random variables and let F(z) = P(Z < z),z e R be the 
corresponding cumulative distribution function. Let us consider 
also 2„, the set of permutations on { 1 , . . . , «}. 

The order statistics of the sample (Z\, . . . ,Z„) is the rear- 
rangement of the sample in increasing order and it is denoted 
by (Z(i.„), . . . ,Z(„ „)). Hence Z(i : „) < . . . , < Z ( „,„) and there exists 
a random permutation <x„ e S„ such that 



towards z q as it follows 
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In the foll owing, some classical results from the literature 
are presented flDavidl IT 98 lb iDelmas & Jour dain 2006). If F is 
continuous, then almost surely Z(i >n ) < . . . , < Z(„ „) and the per- 
mutation <x„ in definition OJ is unique. If Z\ has a probability 
density /, then the probability density of the order statistics is 
given by 

n\l{z l <...Z n }f(z l )...f(Zn). 

A major characteristic of order statistics is that they allow 
quantiles approximations. The quantiles are one of the most easy 
to use tool for characterising a probability distribution. In prac- 
tice, the data distribution can be described by such empirical 
quantiles. 

Two important results are now presented. The first result 
shows how to compute empirical quantiles using order statistics. 
Let us assume that F is continuous and there exists an unique 
solution z q to the equation F(z) = q with q e (0, 1). Clearly, 
z q is the g-quantile of F. Let (k(n),n > 1) be an integers se- 
quence such that 1 > k(n) > n and lim^oo — = q. Then the 
sequence of the empirical quantiles (Z^{n),n),n > 1) converges 
almost surely towards z q . 

The second result allows the computation of confidence in- 
tervals and hypothesis testing. If Z\ has a continuous probability 
density / such that f(z g ) > for q € (0, 1) and if it is supposed 
that k(n) = nq + o( then Z(k(n),n) converges in distribution 



^/n(Z(k(n)ji) - Z q ) ^ N f 0, -f- 
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The exploratory analysis we propose for the perturbation 
data sets is based on the computation of empirical quantiles. 
There are several reasons motivating such a choice. First, there 
is not too much a priori knowledge concerning the perturbations 
marks, except that they are distributed around zero and that they 
are uniformly located in K. This implies that very few hypothe- 
sis with respect to the data can be done. Clearly, in order to apply 
such an analysis the only assumptions needed are the conditions 
of validity for the central limit theorem. From a practical point of 
view, an empirical quantiles based analysis allows for checking 
the tails, the symmetry and the general spatial pattern of the data 
distribution. From a theoretical point of view, the mathematics 
behind this tool allow a rather rigorous analysis. 

2.3. Stable distributions models 

Stable laws are a rich class of probability distributions that allow 
heavy tails, skewness and have many nice mathematical proper- 
ties. They are also known in the literature under the name of a- 
stable, stable Paretian or Levy stab le distributions. These mod- 
els were introduced bv lLevvl ( ll925l) . In the followin g some basic 
notions and results on stable distributions are give n dBorak et al.l 
120051: lFelleilll971b ISamorodnitskv & Taaaull 19941) . 

A random variable Z has a stable distribution if for any 
A,B >0, there is a C > and D e R 1 such that 



AZ] + BZ 2 =CZ + D, 



where Z\ and Z2 are independent copies of Z, and "=" denotes 
equality in distribution. 

A stable distribution is characterised by four parameters 
a e (0,2], ft € [-1, 1], y > and 5 e R 1 and it is denoted 
by S a (J3, y, S). The role of each parameter is as it follows : a de- 
termines the rate at which the distribution tail converges to zero, 
/3 controls the skewness of the distribution, whereas y and 6 are 
the scale and shift parameters, respectively. Figure Q] shows the 
influence of these parameters on the distribution shape. 

The linear transformation of stable random variable is also a 
stable variable. If a 6 (0, 2), then E\Z\ P < 00 for any < p < a 
and E\Z\ P = 00 for any p > a. The distribution is Gaussian if 
a — 2. The stable variable with a < 2 has an infinite variance and 
the correspondi ng distribution tail s are asymptotically equivalent 
to a Pareto law ( Skorokho dl 1961b . More precisely 
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,z a nz>z} = 



a+fi) 



lim^oo z a P{Z<-z) = *b®cr. 



where <x = C a y a , C a = — r-^-, — ttt if a + 1 and C a 
elsewhere. The distribution is symmetric whenever /3 = 



(2) 



0, or 

skewed otherwise. In the case a < 1, the support of the distri- 
bution S a (J3, y, 0) is the positive half-line when /3 — 1 and the 
negative half-line when [} = -Lifer > 1, then the first order 
moment exists and equals the shift parameter 6. 

One of the technical difficulty in the study of stable distribu- 
tion is that except for a few cases (Gaussian, Cauchy and Levy), 
there is no explicit form for the densities. The characteristic 
function can be used instead, in order to describe the distribution. 
There exist numerical methods able to approximate the proba 
bility density and the cumulative distribution functions 



proba- 
dNolanl 



Stoica et al.: Order statistics and heavy-tail distributions 



3 



a) 




b) 




Fig. 1. Influence of the parameters on the shape of a stable distribution : a) parameter, b) a, y and 6 parameters. 



1 19971) . Simulat ion algorithms for sampling stable distrib ution 
can be found in lBorak et ail d2005l) : IChambers et all dl976l) . 

Due to the previous considerations, parameter estimation 
is still an open and chall enging problem. Several methods are 
available in the liter a ture dFama & Rolllll 97 lHMcCullochll 19861: 
iMittnik et al] 119991: iNolanl 120011: iPressI |1972|) . Nevertheless, 
these methods have all the same drawback, in the sense that the 
data is supposed to be a sample of a stable law. It is a well known 
fact, that if the data comes from a different distribution, the in- 
ference of the tail index may be strongly mislea ding. A so lution 
to this problem is to estimate the tail exponent (Hill 1975b and 
then estimate distribution parameters if a e (0, 2]. 

Still, it remains to solve the problem of parameter estima- 
tion whenever the tail exponent is greater than 2. Under these 
circumstances, distributions with regularly varying tails can be 
considered. A random variable has a distribution with regularly 
varying tails of index a > if there exist p,q>Q,p + q- 1 and 



a slowly varying function L, i.e lim z ^ M = 1 for any A > 0, 
such that 

' lim z ^ co z a L(zW{Z> z} = p, 

linv^ z a L(z)¥{Z < -z) = q. 



(3) 



It is important to notice that the conditions (O can be ob- 
tained from (f3]) whenever L(z) — l/cr and p = (1 + f3)/2. 

The parameter es t imatio n algorithm proposed 

by iDavvdov & Paulauskasl d 19991 l2004i) is constructed un- 
der the assumption that the sample distribution has the 
asymptotic property (f2j). The algorithm gives three estimated 
values a,/?,?. The 6 can be computed easily whenever a > 1, 
by approximating it using the empirical mean of the samples. 
This parameter estimation method can be used for stable 
distribution and in this case, <J should indicate positive values 
lower than 2. In the same time, the strong point of the method 
is that it can be used for data not following stable distributions. 
In this case the data distribution is assumed to have regularly 
varying tails. The weak point of this algorithm is that in this 
case, it does not give indications concerning the body of the 
distribution. Nevertheless, in both cases, this method allows a 
rather complete characterisation of a wide panel of probability 
distributions. The code implementing the algorithm is available 
just by simple demand to the authors. 

3. Results 

3. 1 . Empirical quantiles 

The lack of stationarity of the perturbations marks imposes the 
partitioning of the location space in a finite number cells. Let us 
consider such a partition K = U" =l Ki. The cells K, are disjoint 



and they all have the same volume. The size of volume has to 
be big enough in order to contain a sufficient number of pertur- 
bations. In the same time, the volume has to be small enough to 
allow stationarity assumptions for the perturbations marks inside 
a cell. After several trial and errors, we have opted for a partition 
made of square cells Ki, having all the same volume 0.1x0.1 AU, 
so that each cell contains about 1 500 perturbations. 

We were interested in three questions concerning the pertur- 
bations marks distributions. The first two questions are related 
to the tails and the symmetry of the data distribution. The third 
question is related to a more delicate problem. It is a well known 
fact that the perturbations locations follow an uniform distribu- 
tion in K. Nevertheless, much few is known about the spatial dis- 
tribution of the perturbations marks, except that they are highly 
dependent on their corresponding locations. So, the third ques- 
tion to be formulated is the following : do the distributions of the 
perturbations marks exhibit any pattern depending on the pertur- 
bation location ? 

For this purpose, empirical g-quantiles were computed in 
each cell. The most part of these values were indicating that the 
perturbations marks are distributed around the origin, while no 
particular spatial pattern is exhibited in the perturbation location 
space. 

On the other hand, the situation is completely different for 
extremal g-values such as : 0.01,0.05,0.95,0.99. These quan- 
tiles were indicating rather important values around the semi- 
major axis of each planet. In order to check if these values 
may reveal heavy tail distributions, the difference based indicator 
Zq—riq was built. The first term of this indicator represents an em- 
pirical g-quantile. The second term is the theoretical g-quantile 
of the normal law with mean and standard deviation given by 
Zo.50 an d 0.5(zo.84 - zo.i6)- Hence, for values of q approaching 1, 
positive values of the indicator may suggest heavy-tail behaviour 
for the data. Clearly, this indicator may be used also for quantiles 
approaching 0. In this case, it is the negative sign that reveals the 
fatness of the distribution tail. 

In Figure [2] the values obtained for the difference indicator 
Z0.99 - «o.99 are shown. It can be observed that its rather impor- 
tant values appeared whenever the perturbations are located in 
the vicinity of a planet orbit. All these values tend to form a spa- 
tial pattern similar to an arrow-like shape. As it can be observed, 
this shape is situated around the planet orbit and it is pointing 
from the right to left. It tends to vanish, while the cosine of the 
inclination angle approaches -1. The prominence of this arrow 
shape clearly depends on the closest planet : bigger the planet is, 
sharper is the arrow-like shape. This can be observed by looking 
at the change of values for the difference indicator with respect 
the size of the planet. These observations fulfil some good sense 
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expectations : the comets perturbations tend to be more impor- 
tant whenever a comet cross the orbit of a giant planet. 

Since these phenomena are observed for extremal 
g-quantiles, they indicate that the distribution tails may 
be an important feature for the data. Hence, a statistical model 
for the data should be able to catch these characteristics of the 
perturbations marks. 

Empirical quantiles can be also used in straightforward way 
as symmetry indicators of the data distribution. Clearly, by just 
checking whenever the difference z q - \zi- q \ tends to 0, this may 
suggest a rather symmetric data distribution. Figure [3] shows 
the computation of such differences for each data cell. The 
values obtained are rather small all over the studied region. 
Nevertheless, there are some regions and especially around the 
Jupiter's orbit we may suspect the data distributions are a little 
bit skewed. Still, since the perturbations have rather small nu- 
merical values, assessing symmetry using the proposed indicator 
has to be done cautiously. 

It is reasonable to expect a more reliable answer concerning 
this question by using a statistical model. Clearly, such a model 
should be able to catch the symmetry of the data distribution as 
well. 




n 



a) 

Fig. 3. Exploring symmetry using empirical quantiles difference 
Z0.99 - ko.oi I for the perturbations marks around Jupiter. Axis are 
as for Fig. [2] 

The central limit theorem available for the order statistics al- 
lows the construction of an hypothesis test. Since our analysis 
leads us towards heavy-tailed distributions models, as a precau- 
tion, a statistical test was performed to verify if a rather sim- 
pler model can be fitted to the data. The normality assumption 
was considered as null hypothesis for the test. The test was per- 
formed for the data in each cell, by considering that the normal 
distribution parameters are given by the empirical quantiles as 
expained previously. The p-values were computed using a y 1 
distribution. In this context, the local normality assumption for 
the perturbation marks is globally rejected. Figure [4] shows the 
result of testing the normality of the zo.95 empirical quantile com- 
puted around the Jupiter's orbit. 

Indeed, there exist regions where the normality assumptions 
cannot be rejected for the considered quantile. Still, the regions 
where this hypothesis is rejected clearly indicate that normality 
cannot be assumed entirely. Therefore, a parametric statistical 
model has to be able to reflect this situation : indicate whenever 
is the case how "heavy" or how stable are the distributions tails. 

The only parameter used during this exploratory analysis 
was the partitioning of the location domain K. There is one more 
question to answer : do the obtained results depend on the pat- 
terns exhibited by the data, or they are just a consequence of 
the partitioning in cells of the data locations ? To answer this 




Fig. 4. p-values computed for testing the normality of the em- 
pirical quantiles zo.95 around Jupiter. 



question, a bootstrap procedure and a permutation test were im- 
plemented (Davison & Hinklevlll997l) . 

Bootstrap samples were randomly selected by uniformly 
choosing 20% from the entire perturbations data set. Difference 
indicators were computed for this special data set. This opera- 
tion was repeated 100 times. At the end of the procedure, the 
empirical means of the difference indicators were computed. In 
Figure^ the bootstrap mean of the indicator zo.99 - «o.99 around 
Jupiter's orbit is showed. As expected, the same pattern is ob- 
tained as in Figure^ : important values are grouped around the 
planet's orbit while exhibiting an arrow-like shape pointing from 
right to left. 

The permutation test follows the same steps as the bootstrap 
procedure except that the perturbations are previously permuted. 
This means that all the perturbations are modified as it follows : 
for a given perturbation, its mark is kept while its location is 
exchanged with the location of another randomly chosen per- 
turbation. This procedure should destroy any pre-existing struc- 
ture in the data. In this case, we expect that applying a boot- 
strap procedure on this new data set will indicate no relevant pat- 
terns. In Figure|5p the result of such permutation test is showed. 
The experiment was carried out in the vicinity of Jupiter's orbit. 
After permuting the perturbations as indicated, the previously 
described bootstrap procedure was applied in order to estimate 
bootstrap means of the difference indicator zo.99 — «b.99- The re- 
sult confirmed our expectations, in the sense that no particular 
structure or pattern is observed. This clearly indicates, that the 
analysis results were due mainly to the original data structure 
and not to the partitioning of the perturbations location domain 
in cells. 

In the same time, the permutation test is also a verification 
of the proposed exploratory methodology. This methodology de- 
pends on a precision parameter for characterising the hidden 
structure or pattern exhibited by the data. Still, whenever such 
a structure does not exist at all, the present method detects noth- 
ing. 

3.2. Inference using heavy-tail distributions 

The empirical observations of the perturbations marks distribu- 
tions indicated fat tails and skewness behaviour. This leptokurtic 
character of the perturbation distributions was observed espe- 
cially in the vicinity of the planets orbits. In response to this em- 
pirical evidence heavy-tail distribution modelling was chosen. 

The same cell partitioning as for the exploratory analysis is 
maintained. The previously mentioned algorithm for estimating 
stable laws parameters was run for the data in each cell. 
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Fig. 2. Empirical quantiles based difference indicator za.99 - M0.99 f° r the perturbation marks around the big planets : a) Jupiter, b) 
Saturn, c) Uranus d) Neptune. For each diagram the y-axis correspond to initial perihelion distance in AU, and the .x-axis to cosine of 
the inclination. We recall that the respective semi-major axis of the four giant planets are: aj - 5.2 AU, as =9.6 AU, ajj = 19.2 AU, 
fliv = 30.1AU. 




y 



n 



y 



a) 



b) 



Fig. 5. Validation of the analysis based on the computation of the difference indicator zo.99 - «o.99 around Jupiter : a) bootstrap 
procedure ; b) permutation test. 



In Figure [6] the estimation result of the tail exponent is 
shown. Clearly, it can be observed a region formed by the cells 
corresponding to estimated a values lower than 2. This kind of 
region may be located around each orbit corresponding to a big 
planet. The shape of this region is less picked than the region 
obtained using empirical quantiles. Still, the two results are co- 
herent. Both results indicate that the heavy-tailed character of the 
perturbations distributions exhibits a spatial pattern. This spatial 
pattern is located around the orbits of the big planets. 

The skewness of the data distribution can be analysed by 
looking at the results shown in Figure [7] Indeed, it can be ob- 
served that there are cells containing perturbations following a 
skewed distribution. The obtained results indicate neither the 
presence of a pattern by such distributions, nor the presence of 
such a pattern around the orbits of the big planets. 

The estimation results for the cr and 5 parameters are pre- 
sented in Figure[8] The scale parameter indicates how heavy are 
the distribution tails. In Figure [8^, it may be observed that the 
most important values of cr tend to form a spatial pattern sim- 
ilar with the patterns formed by the difference indicator based 
on order statistics and the tail exponent, respectively. The results 
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Fig. 7. Estimation result of the skewness parameter /3 for the per- 
turbations marks around Jupiter. 



obtained for the 6 parameter indicate that a shift of the perturba- 
tion may exist around the orbit of the corresponding big planets. 

In order to check these results a statistical test using the cen- 
tral limit theorem for order statistics was built. Clearly, this result 
can be used in order to verify if the empirical quantiles from a 
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Fig. 8. Estimation result of the scale parameter cr and shift parameter 5 for the perturbation marks around Jupiter. 



cell are coming rather from the distribution characterised by the 
parameters previously estimated. Figure [9] shows the result of a 
test verifying that the zo.99 quantiles around the Jupiter's orbit 
are originated from a heavy-tail distribution, while the quantiles 
outside this region are coming rather from Pareto distribution. 
It can be observed that high values for the p-values are spread 
around the entire region : for 81.5% of the cells we cannot re- 
ject the null hypothesis. Clearly, this result shows a far better 
characterisation of the distribution tails of the perturbations than 
the test for the normality assumption performed in the preceding 
section. 




The previous test certifies the perturbations distributions tails 
exhibit a stable or regular variation behaviour. If the perturba- 
tions are close to the orbit of a big planet then they have rather 
a stable behaviour. Figure [TOl shows the p-values of a^ 2 -test 
implemented for the perturbations with estimated tail exponent 
a < 2. This test allows to check the perturbations also for their 
distribution body. It can be observed that almost in all these re- 
gions the assumption of stable distributions is not rejected. 



Fig. 9. /^-values computed for testing if the empirical quantiles 
Zo.99 around the Jupiter's orbit are originated from a heavy-tail 
distribution. 



was considered for modelling. Its expressions is given below : 



For the perturbations with a tail exponent greater than 2, an 
alternative family of distributions with regularly varying tails 



r 



1 + I KZ - 0) I 



(4) 
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with C K . a the normalising constant, k the scale parameter, ai the 
location parameter and a the tail exponent. 

The parameter estimation for such distributions was done 
in several steps. First, the tail exponent a was considered ob- 
tained from the previous algorithm. Second, the location param- 
eter ai was estimated by the empirical mean of the data samples. 
Finally, the normalising constant C K$a and the scale parameter k 
were estimated using the method of moments. 

A x 2 statistical test was done for the perturbations with 
a > 2. The null hypothesis considered was that the considered 
perturbations follow a regularly varying tails distribution (HJl 
with parameters given by the previously described procedure. 
The obtained /^-values are shown in FigureQT] It can be noticed 
that in the majority of considered cells the null hypothesis is not 
rejected. 

4. Discussion and interpretation 

Some of the features present in the Figures can be explained 
in th e framework of the analytical theo r y of close encoun- 
ters dQpikl Il976t iGreenberg et alJ Il988t ICarusi et alj Il990t 
IValsecchi & Manarall 19971) . 

Let us consider the magnitude of the perturbations in the 
vicinity of a = aj = 5.2 AU (Jupiter). The colour coding of 
the Figure [2] represents the magnitude P of the perturbation, 
corresponding to 



Z = 1 oc hf — hi 

af a. 



(5) 



where a and h are respectively the orbital semi-major axis and 
the orbital energy of the heliocentric keplerian motion of the 
comet. The subscripts , and f stand, respectively, for initial and 
final, i.e., before and after the interaction with Jupiter). 

Perturbations at planetary encounters are characterised 
by large and in general asymmetric tails, as was shown 



by various authors dEverhartlll969l : lOikawa & Everhartlll979l : 
iFroeschle & Rickmanlll98l[): an analytic al expl anation of these 
feature s was given by lCarusi et al.1 d 19901) and bv lValsecchi et al.l 
d2000h . and the c onsequences on the orbital e volution of comets 
was discussed by Valsecchi & M anaral dl997f) . 

Let us consider the case of parabolic initial orbits (our orbits 
are in fact very close to parabolic). In the q-cos i plane, the con- 
dition for the tails of the energy perturbation distribution to be 
symmetric is: 



= 



1-3+2 ■ s j2q/a p cos i 
2 ^3 — 2 ^2q/a p cos/ 



where a p is the orbital semi major axis of the planet encountered. 

However, the finite size of the available perturbation sam- 
ple must be taken into account, as the tails would become suf- 
ficiently populated to show any asymmetry only for very large 
samples. 

A way to take this effect into account is to consider that in 
different regions of the q-cos i plane the probability p for the 
comet on a parabolic orbit to pass within a give n unperturbe d 
distance b from the planet would be, according to lOpikl dl976l) : 



P = 



^3-2 ^j2~q~fL 



dp n sin i J2 — 2q/a p 



To take into account the size of the perturbation, we consider 
that the angle y by which the planetocentric velocity of the comet 
is rotated is given by: 



y a p m p 

tan - = - - == r ; 

2 bm & (3—2 ^2qja p cos i) 
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Fig. 11. p-values of ax 2 statistical test for the perturbations with a > 2 around the big planets : a) Jupiter, b) Saturn, c) Uranus, d) 
Neptune. 



we then define a function / as: 

f = Pi»*\ (6) 

bm p 

a p m Q 7T sin i ^J2 — 2q/a p ^3 — 2 yj2q/a p cos"/ 

Figure [12] shows the level curves of /; as can be seen, in 
it are reproduced the main features of Figure [2] The arrow-like 
shape observed during the statistical study can be now observed 
on the definition domain imposed by ©. This strenghten our 
interpretation of the features of Fig. |2]as due to the geometry of 
close approaches described by Opik theory. 




Fig. 12. Level curves for the function / around the semi-major 
axis of Jupiter. 



5. Conclusion and perspectives 

In this paper a statistical study of the planetary perturbations 
on Oort cloud comets was carried out. The exploratory anal- 
ysis of the perturbations distributions based on order statistics 



indicated the tail behaviour as determinant feature. Following 
this idea, parametric inference for heavy-tail distributions was 
implement. The obtained results indicated that the perturbations 
following heavy-tail stable distributions that are not always sym- 
metric while tending to form a spatial pattern. This pattern is 
rather arrow-like shaped and is situated around the orbits of the 
big planets. A theoretical study was carried out, and it was ob- 
served that this pattern is similar with the theoretical curves de- 
rived from the Opik theory. The perturbations outside this arrow 
shaped region were not exhibiting a stable character and they 
were modelled by a family of distributions with regularly vary- 
ing tails. In both cases, stable and non-stable distributions, the 
modelling choices were confirmed by a statistical test. 

Clearly, these choices and the estimation parameter estima- 
tion procedures can be further improved. Nevertheless, the ob- 
tained results give good indications and also good reasons for 
developing a probabilistic methodology able to simulate such 
planetary perturbations. 
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ABSTRACT 

Aims. This paper tackles important aspects of comets dynamics from a statistical point of view. Existing methodology uses numerical 
integration for computing planetary perturbations for simulating such dynamics. This operation is highly computational. It is reason- 
able to wonder whenever statistical simulation of the perturbations can be much more easy to handle. 

Methods. The first step for answering such a question is to provide a statistical study of these perturbations in order to catch their 
main features. The statistical tools used are order statistics and heavy tail distributions. 

Results. The study carried out indicated a general pattern exhibited by the perturbations around the orbits of the important planet. 
These characteristics were validated through statistical testing and a theoretical study based on Opik theory. 

Key words. Methods: statistical; Celestial mechanics; Oort cloud 



1. Introduction 

Comet dynamics is one of the most difficult phenomena to 
model in celestial mechanics. Indeed their dynamics is strongly 
chaotic, thus individual motions of known comets are hardly re- 
producible for more than a few orbital periods. When the origin 
of comets is under investigation, one is thus constrained to make 
use of statistical tools in order to model the motion of a huge 
number of comets supposed to be representative of the actual 
population. Such statistical model should also be reliable on a 
time scale comparable to the age of the solar system. 

Due to their very elongated shapes, comet trajectories are af- 
fected by planetary perturbations during close encounters with 
planets. Such perturbations turn out to be the main mechanisms 
able to affect comet trajectories. Consequently, it is of major im- 
portance to model these perturbations in a way which is statisti- 
cally reliable and with the lowest cost in computing time. 

A direct numerical integration of a 6 bodies restricted prob- 
lem (Sun, Jupiter, Saturn, Uranus, Neptune, Comet) each time 
a comet enters the planetary region of the Solar System is not 
possible due to the cost in computer time. 

Looking for an alternative approach, we can take advantage 
of the fact that planetary perturbation on Oort cloud comets 
are uncorrected. In fact the orbital period of such comets are 
so much larger than those of the planets, that when the comet 
returns, the phases of the latter can be taken at random. Thus 
we can build a synthetic integrator a la Froeschle and Rickman 
(Froeschle & Rickman 1981) to speed up the modeling. The crit- 
icism by (Fouchard et al. 2003) to such an approach does not ap- 
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ply in the present case because, as just said, successive planetary 
perturbations on an Oort cloud comets are uncorrelated. 

The aim of this paper is to give a statistical description of 
a large set of planetary perturbations assumed to be representa- 
tive of those acting on Oort cloud comets entering the planetary 
region. To this purpose we use order statistics and heavy tails 
distributions. 

The rest of this paper is organised as follow. Section 2 is 
devoted to the presentation of the mechanism producing the data, 

1. e. the planetary perturbations and the statistical tools used to 
analyse the data. These tools are order statistics and heavy-tail 
distributions, that allow, respectively, the study and the modeling 
of the data distribution, with respect to its symmetry, skewness 
and tail fatness. The obtained results are shown and interpreted 
in the third section. The results are finally analysed from a more 
theoretical point of view using the Opik theory in Section 4. The 
paper closes with conclusions and perspectives. 

2. Statistical tools 

2.1. Data compilation 

By planetary perturbations, one intends the variations of the or- 
bital parameters between their values before entering the plan- 
etary region of the Solar System, i.e. the bary centric orbital 
element of the osculating cometary orbit (z,-, q t , cos cjj, 0,) r 
(where q, i, o>, Q. are the perihelion distance, the inclination, the 
argument of perihelion and the longitude of the ascending node 
and z = -1 la with a the semi-major axis), and their final val- 
ues (zf ,q /, cos if ,tOf,£lf) T , that is either when the comet is at 
its aphelion or when it is back on a keplerian barycentric orbit. 

Between its initial and final values, the system Sun + Jupiter 
+ Saturn + Uranus + Neptune + comet is integrated using the 
RADAU integrator at the 15th order (Everhart 1985) for a max- 
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imum of 2000 yrs. Then the planetary perturbation obtained 
through this integration is (Az = Zf — Zu Aq - qj - q t , A cos i = 
cos if - cos ;,, Aw = a>f- AQ. = Q.f- Q,) r . The detail on the 
numerical experiment used to perform the integrations may be 
found in Rickman et al. (2001). 

Repeating the above experiment with a huge number of 
comets (namely 9 600 000), one gets a set of planetary pertur- 
bations. The comets are chosen with uniform distribution of the 
perihelion distance between and 32 AU, cosine of the ecliptic 
inclination between -1 and 1 and argument of perihelion, longi- 
tude of the ascending node between and 360°. The initial mean 
anomaly is chosen such that the perihelion passage on its initial 
keplerian orbit occurs randomly with an uniform distribution be- 
tween 500 and 1 500 years after the beginning of the integration. 

In the present study, because the perturbations are mainly 
depending on g, and cos /, (Fernandez 1981), each perturbation 
is associated to the couple (cos /,, qi). Similarly, since the orbital 
energy is the main quantity which is affected by the planetary 
perturbations, we will consider only these perturbations here. 

Consequently, our data are composed by a set of triplets 
(cos ;,, qi, Z) where Z = Zf — Zi denotes the perturbations of the 
cometary orbital energy by the planets, and (cos qf) a point in a 
space denoted by K. In the following, we call Z the perturbation 
mark. 

2.2. Exploratory analysis based on order statistics 

Let Zi , . . . , Z„ be a sequence of independent identically dis- 
tributed random variables and let F(z) = P(Z < z),z e R be the 
corresponding cumulative distribution function. Let us consider 
also £„, the set of permutations on {1, . . . , n}. 

The order statistics of the sample (Z\ , . . . , Z„) is the rear- 
rangement of the sample in increasing order and it is denoted 
by (Z(\ A ), . . . , Z( Bi „)). Hence Z(i >ra ) < . . .,< Z( n , n ) and there exists 
a random permutation cr„ e S„ such that 

(Z(l,n), ■ ■ ■ , Z(n,n)) = (Zcr„(l), ■ ■ ■ , Zo- n (n))- (1) 

In the following, some classical results from the literature 
are presented (David 1981; Delmas & Jourdain 2006). If F is 
continuous, then almost surely Z(i „) < . . . , < Z(„ „) and the per- 
mutation cr„ in definition (1) is unique. If Z\ has a probability 
density /, then the probability density of the order statistics is 
given by 

n\l{zi<...z„}f(zi)...f(z„). 

A major characteristic of order statistics is that they allow 
quantiles approximations. The quantiles are one of the most easy 
to use tool for characterising a probability distribution. In prac- 
tice, the data distribution can be described by such empirical 
quantiles. 

Two important results are now presented. The first result 
shows how to compute empirical quantiles using order statistics. 
Let us assume that F is continuous and there exists an unique 
solution z q to the equation F(z) = q with q e (0, 1). Clearly, 
z q is the g-quantile of F. Let (k(n),n > 1) be an integers se- 
quence such that 1 > k(n) > n and lim„^oo — = q. Then the 
sequence of the empirical quantiles (Z(t( n ),n),n > 1) converges 
almost surely towards z q . 

The second result allows the computation of confidence in- 
tervals and hypothesis testing. If Z\ has a continuous probability 
density / such that f{z q ) > for q e (0, 1) and if it is supposed 
that k(ri) = nq + o( yjn), then Z^nXn) converges in distribution 



towards z q as it follows 

yfn(Z (k(nM - Zq) -> N |o, ^f j • 

The exploratory analysis we propose for the perturbation 
data sets is based on the computation of empirical quantiles. 
There are several reasons motivating such a choice. First, there 
is not too much a priori knowledge concerning the perturbations 
marks, except that they are distributed around zero and that they 
are uniformly located in K. This implies that very few hypothe- 
sis with respect to the data can be done. Clearly, in order to apply 
such an analysis the only assumptions needed are the conditions 
of validity for the central limit theorem. From a practical point of 
view, an empirical quantiles based analysis allows for checking 
the tails, the symmetry and the general spatial pattern of the data 
distribution. From a theoretical point of view, the mathematics 
behind this tool allow a rather rigorous analysis. 

2.3. Stable distributions models 

Stable laws are a rich class of probability distributions that allow 
heavy tails, skewness and have many nice mathematical proper- 
ties. They are also known in the literature under the name of a- 
stable, stable Paretian or Levy stable distributions. These mod- 
els were introduced by Levy (1925). In the following some basic 
notions and results on stable distributions are given (Borak et al. 
2005; Feller 1971; Samorodnitsky & Taqqu 1994). 

A random variable Z has a stable distribution if for any 
A,B>0, there is a C > and Del 1 such that 

AZi + BZ 2 =CZ + D, 

where Z\ and Z2 are independent copies of Z, and "=" denotes 
equality in distribution. 

A stable distribution is characterised by four parameters 
a e (0,2], B e [-1, 1], y > and 6 e R 1 and it is denoted 
by S a (B, y, 6). The role of each parameter is as it follows : a de- 
termines the rate at which the distribution tail converges to zero, 
B controls the skewness of the distribution, whereas y and 5 are 
the scale and shift parameters, respectively. Figure 1 shows the 
influence of these parameters on the distribution shape. 

The linear transformation of stable random variable is also a 
stable variable. If a e (0, 2), then E\Z\ P < 00 for any < p < a 
and E\Z\ P = 00 for any p > a. The distribution is Gaussian if 
a = 2. The stable variable with a < 2 has an infinite variance and 
the corresponding distribution tails are asymptotically equivalent 
to a Pareto law (Skorokhod 1961). More precisely 

lim^ ro z Q 'P{Z>z} = ^cr, 
lim^oc z a ¥{Z < -z) = ^£T. 

where cr = C a y a , C a = 2r{2 _^l( m i2) if a * 1 and C « = I 
elsewhere. The distribution is symmetric whenever B — 0, or 
skewed otherwise. In the case a < 1, the support of the distri- 
bution S a (B, y, 0) is the positive half-line when B = 1 and the 
negative half-line when B = -l.Ifa > 1, then the first order 
moment exists and equals the shift parameter 6. 

One of the technical difficulty in the study of stable distribu- 
tion is that except for a few cases (Gaussian, Cauchy and Levy), 
there is no explicit form for the densities. The characteristic 
function can be used instead, in order to describe the distribution. 
There exist numerical methods able to approximate the proba- 
bility density and the cumulative distribution functions (Nolan 
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Fig. 1. Influence of the parameters on the shape of a stable distribution : 

1997). Simulation algorithms for sampling stable distribution 
can be found in Borak et al. (2005); Chambers et al. (1976). 

Due to the previous considerations, parameter estimation 
is still an open and challenging problem. Several methods are 
available in the literature (Fama & Roll 1971; McCulloch 1986; 
Mittniketal. 1999; Nolan 2001; Press 1972). Nevertheless, these 
methods have all the same drawback, in the sense that the data 
is supposed to be a sample of a stable law. It is a well known 
fact, that if the data comes from a different distribution, the in- 
ference of the tail index may be strongly misleading. A solution 
to this problem is to estimate the tail exponent (Hill 1975) and 
then estimate distribution parameters if a e (0, 2]. 

Still, it remains to solve the problem of parameter estima- 
tion whenever the tail exponent is greater than 2. Under these 
circumstances, distributions with regularly varying tails can be 
considered. A random variable has a distribution with regularly 
varying tails of index a > if there exist p,q>0,p + q- 1 and 
a slowly varying function L, i.e lim 7 ^ M — 1 f° r anv ^ > 0, 
such that 

\im z ^ ca z a L(zW{Z> z) = p, m 
\\m z ^z°L{z)nZ < -z] = q. ( ' 

It is important to notice that the conditions (2) can be ob- 
tained from (3) whenever L(z) — 1/cr and p = (1 + f3)/2. 

The parameter estimation algorithm proposed by Davydov 
& Paulauskas (1999, 2004) is constructed under the assumption 
that the sample distribution has the asymptotic property (2). The 
algorithm gives three estimated values a,j3, cr. The 6 can be com- 
puted easily whenever a > 1, by approximating it using the em- 
pirical mean of the samples. This parameter estimation method 
can be used for stable distribution and in this case, a should in- 
dicate positive values lower than 2. In the same time, the strong 
point of the method is that it can be used for data not follow- 
ing stable distributions. In this case the data distribution is as- 
sumed to have regularly varying tails. The weak point of this 
algorithm is that in this case, it does not give indications con- 
cerning the body of the distribution. Nevertheless, in both cases, 
this method allows a rather complete characterisation of a wide 
panel of probability distributions. The code implementing the 
algorithm is available just by simple demand to the authors. 

3. Results 

3. 1 . Empirical quantiles 

The lack of stationarity of the perturbations marks imposes the 
partitioning of the location space in a finite number cells. Let us 
consider such a partition K = U" =1 ^T,-. The cells Kj are disjoint 
and they all have the same volume. The size of volume has to 
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i) fi parameter, b) a, y and S parameters. 

be big enough in order to contain a sufficient number of pertur- 
bations. In the same time, the volume has to be small enough to 
allow stationarity assumptions for the perturbations marks inside 
a cell. After several trial and errors, we have opted for a partition 
made of square cells K t , having all the same volume 0.1x0.1 AU, 
so that each cell contains about 1 500 perturbations. 

We were interested in three questions concerning the pertur- 
bations marks distributions. The first two questions are related 
to the tails and the symmetry of the data distribution. The third 
question is related to a more delicate problem. It is a well known 
fact that the perturbations locations follow an uniform distribu- 
tion in K. Nevertheless, much few is known about the spatial dis- 
tribution of the perturbations marks, except that they are highly 
dependent on their corresponding locations. So, the third ques- 
tion to be formulated is the following : do the distributions of the 
perturbations marks exhibit any pattern depending on the pertur- 
bation location ? 

For this purpose, empirical g-quantiles were computed in 
each cell. The most part of these values were indicating that the 
perturbations marks are distributed around the origin, while no 
particular spatial pattern is exhibited in the perturbation location 
space. 

On the other hand, the situation is completely different for 
extremal g-values such as : 0.01,0.05,0.95,0.99. These quan- 
tiles were indicating rather important values around the semi- 
major axis of each planet. In order to check if these values 
may reveal heavy tail distributions, the difference based indicator 
Zq—n q was built. The first term of this indicator represents an em- 
pirical g-quantile. The second term is the theoretical g-quantile 
of the normal law with mean and standard deviation given by 
Zo.50 and 0.5(zo.84 - zo.i6)- Hence, for values of q approaching 1, 
positive values of the indicator may suggest heavy-tail behaviour 
for the data. Clearly, this indicator may be used also for quantiles 
approaching 0. In this case, it is the negative sign that reveals the 
fatness of the distribution tail. 

In Figure 2 the values obtained for the difference indicator 
Z0.99 - «o.99 are shown. It can be observed that its rather impor- 
tant values appeared whenever the perturbations are located in 
the vicinity of a planet orbit. All these values tend to form a spa- 
tial pattern similar to an arrow-like shape. As it can be observed, 
this shape is situated around the planet orbit and it is pointing 
from the right to left. It tends to vanish, while the cosine of the 
inclination angle approaches -1. The prominence of this arrow 
shape clearly depends on the closest planet : bigger the planet is, 
sharper is the arrow-like shape. This can be observed by looking 
at the change of values for the difference indicator with respect 
the size of the planet. These observations fulfil some good sense 
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expectations : the comets perturbations tend to be more impor- 
tant whenever a comet cross the orbit of a giant planet. 

Since these phenomena are observed for extremal 
g-quantiles, they indicate that the distribution tails may 
be an important feature for the data. Hence, a statistical model 
for the data should be able to catch these characteristics of the 
perturbations marks. 

Empirical quantiles can be also used in straightforward way 
as symmetry indicators of the data distribution. Clearly, by just 
checking whenever the difference z q - \zi- q \ tends to 0, this may 
suggest a rather symmetric data distribution. Figure 3 shows 
the computation of such differences for each data cell. The 
values obtained are rather small all over the studied region. 
Nevertheless, there are some regions and especially around the 
Jupiter's orbit we may suspect the data distributions are a little 
bit skewed. Still, since the perturbations have rather small nu- 
merical values, assessing symmetry using the proposed indicator 
has to be done cautiously. 

It is reasonable to expect a more reliable answer concerning 
this question by using a statistical model. Clearly, such a model 
should be able to catch the symmetry of the data distribution as 
well. 



n 



y 



a) 



Fig. 3. Exploring symmetry using empirical quantiles difference z .99 _ 
Izo.oil for the perturbations marks around Jupiter. Axis are as for Fig. 2 



The central limit theorem available for the order statistics al- 
lows the construction of an hypothesis test. Since our analysis 
leads us towards heavy-tailed distributions models, as a precau- 
tion, a statistical test was performed to verify if a rather sim- 
pler model can be fitted to the data. The normality assumption 
was considered as null hypothesis for the test. The test was per- 
formed for the data in each cell, by considering that the normal 
distribution parameters are given by the empirical quantiles as 
expained previously. The p-values were computed using a x 1 
distribution. In this context, the local normality assumption for 
the perturbation marks is globally rejected. Figure 4 shows the 
result of testing the normality of the zo.95 empirical quantile com- 
puted around the Jupiter's orbit. 

Indeed, there exist regions where the normality assumptions 
cannot be rejected for the considered quantile. Still, the regions 
where this hypothesis is rejected clearly indicate that normality 
cannot be assumed entirely. Therefore, a parametric statistical 
model has to be able to reflect this situation : indicate whenever 
is the case how "heavy" or how stable are the distributions tails. 

The only parameter used during this exploratory analysis 
was the partitioning of the location domain K. There is one more 
question to answer : do the obtained results depend on the pat- 
terns exhibited by the data, or they are just a consequence of 
the partitioning in cells of the data locations ? To answer this 
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Fig. 4. p— values computed for testing the normality of the empirical 
quantiles zo.95 around Jupiter. 



question, a bootstrap procedure and a permutation test were im- 
plemented (Davison & Hinkley 1997). 

Bootstrap samples were randomly selected by uniformly 
choosing 20% from the entire perturbations data set. Difference 
indicators were computed for this special data set. This opera- 
tion was repeated 100 times. At the end of the procedure, the 
empirical means of the difference indicators were computed. In 
Figure 5a the bootstrap mean of the indicator zo.99 - wo.99 around 
Jupiter's orbit is showed. As expected, the same pattern is ob- 
tained as in Figure 2a : important values are grouped around the 
planet's orbit while exhibiting an arrow-like shape pointing from 
right to left. 

The permutation test follows the same steps as the bootstrap 
procedure except that the perturbations are previously permuted. 
This means that all the perturbations are modified as it follows : 
for a given perturbation, its mark is kept while its location is 
exchanged with the location of another randomly chosen per- 
turbation. This procedure should destroy any pre-existing struc- 
ture in the data. In this case, we expect that applying a boot- 
strap procedure on this new data set will indicate no relevant pat- 
terns. In Figure 5b the result of such permutation test is showed. 
The experiment was carried out in the vicinity of Jupiter's orbit. 
After permuting the perturbations as indicated, the previously 
described bootstrap procedure was applied in order to estimate 
bootstrap means of the difference indicator zo.99 - "0.99- The re- 
sult confirmed our expectations, in the sense that no particular 
structure or pattern is observed. This clearly indicates, that the 
analysis results were due mainly to the original data structure 
and not to the partitioning of the perturbations location domain 
in cells. 

In the same time, the permutation test is also a verification 
of the proposed exploratory methodology. This methodology de- 
pends on a precision parameter for characterising the hidden 
structure or pattern exhibited by the data. Still, whenever such 
a structure does not exist at all, the present method detects noth- 
ing. 

3.2. Inference using heavy-tail distributions 

The empirical observations of the perturbations marks distribu- 
tions indicated fat tails and skewness behaviour. This leptokurtic 
character of the perturbation distributions was observed espe- 
cially in the vicinity of the planets orbits. In response to this em- 
pirical evidence heavy-tail distribution modelling was chosen. 

The same cell partitioning as for the exploratory analysis is 
maintained. The previously mentioned algorithm for estimating 
stable laws parameters was run for the data in each cell. 
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Fig. 2. Empirical quantiles based difference indicator zo.gg - "0.99 f° r the perturbation marks around the big planets : a) Jupiter, b) Saturn, c) Uranus 
d) Neptune. For each diagram the y-axis correspond to initial perihelion distance in AU, and the x-axis to cosine of the inclination. We recall that 
the respective semi-major axis of the four giant planets are: aj = 5.2 AU, a s = 9.6 AU, a u = 19.2 AU, a N = 30.1 AU. 




a) b) 



Fig. 5. Validation of the analysis based on the computation of the difference indicator zo.99 - "0 99 around Jupiter : a) bootstrap procedure ; b) 
permutation test. 



In Figure 6 the estimation result of the tail exponent is 
shown. Clearly, it can be observed a region formed by the cells 
corresponding to estimated a values lower than 2. This kind of 
region may be located around each orbit corresponding to a big 
planet. The shape of this region is less picked than the region 
obtained using empirical quantiles. Still, the two results are co- 
herent. Both results indicate that the heavy-tailed character of the 
perturbations distributions exhibits a spatial pattern. This spatial 
pattern is located around the orbits of the big planets. 

The skewness of the data distribution can be analysed by 
looking at the results shown in Figure 7. Indeed, it can be ob- 
served that there are cells containing perturbations following a 
skewed distribution. The obtained results indicate neither the 
presence of a pattern by such distributions, nor the presence of 
such a pattern around the orbits of the big planets. 

The estimation results for the cr and 6 parameters are pre- 
sented in Figure 8. The scale parameter indicates how heavy are 
the distribution tails. In Figure 8a, it may be observed that the 
most important values of cr tend to form a spatial pattern sim- 
ilar with the patterns formed by the difference indicator based 
on order statistics and the tail exponent, respectively. The results 




Fig. 7. Estimation result of the skewness parameter /3 for the perturba- 
tions marks around Jupiter. 



obtained for the 6 parameter indicate that a shift of the perturba- 
tion may exist around the orbit of the corresponding big planets. 

In order to check these results a statistical test using the cen- 
tral limit theorem for order statistics was built. Clearly, this result 
can be used in order to verify if the empirical quantiles from a 
cell are coming rather from the distribution characterised by the 
parameters previously estimated. Figure 9 shows the result of a 
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Fig. 8. Estimation result of the scale parameter cr and shift parameter 6 for the perturbation marks around Jupiter. 



test verifying that the zo.99 quantiles around the Jupiter's orbit 
are originated from a heavy-tail distribution, while the quantiles 
outside this region are coming rather from Pareto distribution. 
It can be observed that high values for the p-values are spread 
around the entire region : for 81.5% of the cells we cannot re- 
ject the null hypothesis. Clearly, this result shows a far better 
characterisation of the distribution tails of the perturbations than 
the test for the normality assumption performed in the preceding 
section. 

The previous test certifies the perturbations distributions tails 
exhibit a stable or regular variation behaviour. If the perturba- 
tions are close to the orbit of a big planet then they have rather 
a stable behaviour. Figure 10 shows the p-values of a^- 2 -test 
implemented for the perturbations with estimated tail exponent 
a < 2. This test allows to check the perturbations also for their 
distribution body. It can be observed that almost in all these re- 
gions the assumption of stable distributions is not rejected. 

For the perturbations with a tail exponent greater than 2, an 
alternative family of distributions with regularly varying tails 
was considered for modelling. Its expressions is given below : 




Fig. 9. p— values computed for testing if the empirical quantiles Z0.99 
around the Jupiter's orbit are originated from a heavy-tail distribution. 



with C K , a the normalising constant, k the scale parameter, oj the 
location parameter and a the tail exponent. 

The parameter estimation for such distributions was done 
in several steps. First, the tail exponent a was considered ob- 
tained from the previous algorithm. Second, the location param- 
eter a) was estimated by the empirical mean of the data samples. 
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Fig. 10. p— values of ax 2 statistical test for the perturbations with a < 2 around the big planets : a) Jupiter, b) Saturn, c) Uranus, d) Neptune. 



Finally, the normalising constant C KA and the scale parameter k 
were estimated using the method of moments. 

A x 1 statistical test was done for the perturbations with 
a > 2. The null hypothesis considered was that the considered 
perturbations follow a regularly varying tails distribution (4) 
with parameters given by the previously described procedure. 
The obtained p-values are shown in Figure 1 1. It can be noticed 
that in the majority of considered cells the null hypothesis is not 
rejected. 

4. Discussion and interpretation 

Some of the features present in the Figures can be explained 
in the framework of the analytical theory of close encounters 
(Opik 1976; Greenberg et al. 1988; Carusi et al. 1990; Valsecchi 
& Manara 1997). 

Let us consider the magnitude of the perturbations in the 
vicinity of a = aj = 5.2 AU (Jupiter). The colour coding of 
the Figure 2 represents the magnitude P of the perturbation, 
corresponding to 

1 1 

Z= + — och f -hj (5) 

fly a,- 

where a and h are respectively the orbital semi-major axis and 
the orbital energy of the heliocentric keplerian motion of the 
comet. The subscripts , and f stand, respectively, for initial and 
final, i.e., before and after the interaction with Jupiter). 

Perturbations at planetary encounters are characterised by 
large and in general asymmetric tails, as was shown by various 
authors (Everhart 1969; Oikawa & Everhart 1979; Froeschle & 
Rickman 1981); an analytical explanation of these features was 
given by Carusi et al. (1990) and by Valsecchi et al. (2000), and 
the consequences on the orbital evolution of comets was dis- 
cussed by Valsecchi & Manara (1997). 

Let us consider the case of parabolic initial orbits (our orbits 
are in fact very close to parabolic). In the q-cos i plane, the con- 
dition for the tails of the energy perturbation distribution to be 



symmetric is: 

1-3 + 2 -yj2q/a p cos i 

2 ^3 — 2 sj2q/a p cos/ 

where a p is the orbital semi major axis of the planet encountered. 

However, the finite size of the available perturbation sam- 
ple must be taken into account, as the tails would become suf- 
ficiently populated to show any asymmetry only for very large 
samples. 

A way to take this effect into account is to consider that in 
different regions of the q-cos i plane the probability p for the 
comet on a parabolic orbit to pass within a given unperturbed 
distance b from the planet would be, according to Opik (1976): 

yi J3 — 2 yj2qla p cos/ 

P = -j , 

a p Ji sin i y2 - 2q/a p 

To take into account the size of the perturbation, we consider 
that the angle y by which the planetocentric velocity of the comet 
is rotated is given by: 

y a p m p 

tan^ = -; 

^ bniQ (3—2 sj2q/a p cos z) 

we then define a function / as: 

f = pton~ (6) 
bm p 

a p m & n sin i ^J2 — 2q/a p ^3 — 2 sJ2q/a p cos"/ 

Figure 12 shows the level curves of /; as can be seen, in 
it are reproduced the main features of Figure 2. The arrow-like 
shape observed during the statistical study can be now observed 
on the definition domain imposed by (6). This strenghten our 
interpretation of the features of Fig. 2 as due to the geometry of 
close approaches described by Opik theory. 
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Fig. 11. p— values of ax 2 statistical test for the perturbations with a > 2 around the big planets : a) Jupiter, b) Saturn, c) Uranus, d) Neptune. 




Fig. 12. Level curves for the function / around the semi-major axis of 
Jupiter. 



5. Conclusion and perspectives 

In this paper a statistical study of the planetary perturbations 
on Oort cloud comets was carried out. The exploratory anal- 
ysis of the perturbations distributions based on order statistics 
indicated the tail behaviour as determinant feature. Following 
this idea, parametric inference for heavy-tail distributions was 
implement. The obtained results indicated that the perturbations 
following heavy-tail stable distributions that are not always sym- 
metric while tending to form a spatial pattern. This pattern is 
rather arrow-like shaped and is situated around the orbits of the 
big planets. A theoretical study was carried out, and it was ob- 
served that this pattern is similar with the theoretical curves de- 
rived from the Opik theory. The perturbations outside this arrow 
shaped region were not exhibiting a stable character and they 
were modelled by a family of distributions with regularly vary- 
ing tails. In both cases, stable and non-stable distributions, the 
modelling choices were confirmed by a statistical test. 

Clearly, these choices and the estimation parameter estima- 
tion procedures can be further improved. Nevertheless, the ob- 
tained results give good indications and also good reasons for 



developing a probabilistic methodology able to simulate such 
planetary perturbations. 
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